31 research outputs found
Recommended from our members
The Design and Implementation of Low-Latency Prediction Serving Systems
Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Mitigating the Performance Impact of Network Failures in Public Clouds
Some faults in data center networks require hours to days to repair because
they may need reboots, re-imaging, or manual work by technicians. To reduce
traffic impact, cloud providers \textit{mitigate} the effect of faults, for
example, by steering traffic to alternate paths. The state-of-art in automatic
network mitigations uses simple safety checks and proxy metrics to determine
mitigations. SWARM, the approach described in this paper, can pick orders of
magnitude better mitigations by estimating end-to-end connection-level
performance (CLP) metrics. At its core is a scalable CLP estimator that quickly
ranks mitigations with high fidelity and, on failures observed at a large cloud
provider, outperforms the state-of-the-art by over 700 in some cases
Data Analytics Service Composition and Deployment on Edge Devices
Data analytics on edge devices has gained rapid growth in research, industry, and different aspects of our daily life. This topic still faces many challenges such as limited computation resource on edge devices. In this paper, we further identify two main challenges: the composition and deployment of data analytics services on edge devices. We present the Zoo system to address these two challenge: on one hand, it provides simple and concise domain-specific language to enable easy and and type-safe composition of different data analytics services; on the other, it utilises multiple deployment backends, including Docker container, JavaScript, and MirageOS, to accommodate the heterogeneous edge deployment environment. We show the expressiveness of Zoo with a use case, and thoroughly compare the performance of different deployment backends in evaluation
Recommended from our members
The Design and Implementation of Low-Latency Prediction Serving Systems
Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints